{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "96b2b1e4",
   "metadata": {},
   "source": [
    "# Async Query Demo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9331cfeb",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# NOTE: This is ONLY necessary in jupyter notebook.\n",
    "# Details: Jupyter runs an event-loop behind the scenes.\n",
    "#          This results in nested event-loops when we start an event-loop to make async queries.\n",
    "#          This is normally not allowed, we use nest_asyncio to allow it for convenience.\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "8a1d2821",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import time\n",
    "from llama_index import ListIndex, SimpleDirectoryReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "4466dec2",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "query_str = \"What is Paul Graham's biggest achievement?\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "6948df36",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"../paul_graham_essay/data\").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "466c3892",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "index = ListIndex.from_documents(documents)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d9115d1",
   "metadata": {},
   "source": [
    "#### By default, generate a response through hierarchical tree summarization (i.e., `response_mode=tree_summarize`) makes blocking LLM calls"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "d9ef0fef",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "39.737s\n"
     ]
    }
   ],
   "source": [
    "start_time = time.perf_counter()\n",
    "query_engine = index.as_query_engine(response_mode=\"tree_summarize\")\n",
    "query_engine.query(query_str)\n",
    "elapsed_time = time.perf_counter() - start_time\n",
    "\n",
    "print(f\"{elapsed_time:0.3f}s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9392d573",
   "metadata": {},
   "source": [
    "It takes a long time to generate a response through hierarchical tree summarization (i.e., `response_mode=tree_summarize`). This is because each LLM call is waiting for the previous one to finish. Time is waisted in waiting for an IO response.  \n",
    "\n",
    "With `async` call, instead of waiting for the response from the server, you carry on with the next requests, effectively batching together all the requests so that they can be done in parallel."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43fa6067-493e-42e5-8f6d-2c65a7a67e75",
   "metadata": {},
   "source": [
    "#### Option 1: Running `aquery` (async query call) will take advantage of async LLM calls"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "f29da0b6-7454-4661-88ab-9ebc80533743",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "13.021s\n"
     ]
    }
   ],
   "source": [
    "import asyncio\n",
    "\n",
    "start_time = time.perf_counter()\n",
    "task = query_engine.aquery(query_str)\n",
    "asyncio.run(task)\n",
    "elapsed_time = time.perf_counter() - start_time\n",
    "\n",
    "print(f\"{elapsed_time:0.3f}s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9b0f10f-8d74-4db4-a962-b2d00c575762",
   "metadata": {
    "tags": []
   },
   "source": [
    "It is faster to generate a response through `aquery` (i.e., `response_mode=tree_summarize`)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "474f82d1",
   "metadata": {},
   "source": [
    "#### Option 2: Pass in `use_async=True` to enable asynchronous LLM calls within a synchronous `query`\n",
    "\n",
    "This approach makes a synchronous `query` calls, but runs async tasks during the \"tree_summarize\" operation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "78a02987",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10.589s\n"
     ]
    }
   ],
   "source": [
    "start_time = time.perf_counter()\n",
    "query_engine = index.as_query_engine(\n",
    "    response_mode=\"tree_summarize\",\n",
    "    use_async=True,\n",
    ")\n",
    "query_engine.query(query_str)\n",
    "elapsed_time = time.perf_counter() - start_time\n",
    "\n",
    "print(f\"{elapsed_time:0.3f}s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23469128",
   "metadata": {},
   "source": [
    "It takes ~6.9s to generate a response through hierarchical tree summarization (i.e., `response_mode=tree_summarize`)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4db11d22",
   "metadata": {},
   "source": [
    "## Async Query with a list of Queries\n",
    "now suppose you wanted to query the `query_engine` with a list of queries. While earlier the bottleneck was llama_index internally waiting for each response, here the bottleneck is when you wait for the response for each call `query()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "0ffa1900",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "35.264s\n"
     ]
    }
   ],
   "source": [
    "# a list of different queries (yeah I cheated in this part)\n",
    "query_list = [query_str] * 3\n",
    "\n",
    "start_time = time.perf_counter()\n",
    "query_engine = index.as_query_engine(\n",
    "    response_mode=\"tree_summarize\",\n",
    "    use_async=True,\n",
    ")\n",
    "for q in query_list:\n",
    "    _ = query_engine.query(q)\n",
    "elapsed_time = time.perf_counter() - start_time\n",
    "\n",
    "print(f\"{elapsed_time:0.3f}s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b41df05e",
   "metadata": {},
   "source": [
    "Here async can help you."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "fe31bb79",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "13.025s\n"
     ]
    }
   ],
   "source": [
    "start_time = time.perf_counter()\n",
    "query_engine = index.as_query_engine(\n",
    "    response_mode=\"tree_summarize\",\n",
    ")\n",
    "\n",
    "# run each query in parallel\n",
    "async def async_query(query_engine, questions):\n",
    "    tasks = [query_engine.aquery(q) for q in questions]\n",
    "    r = await asyncio.gather(*tasks)\n",
    "    return r\n",
    "\n",
    "\n",
    "_ = asyncio.run(async_query(query_engine, query_list))\n",
    "elapsed_time = time.perf_counter() - start_time\n",
    "\n",
    "print(f\"{elapsed_time:0.3f}s\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
